Document made available under the 
Patent Cooperation Treaty (PCT) 



International application number: PCT/EP04/053471 
International filing date: 14 December 2004 (14.12.2004) 



Document type: Certified copy of priority document 

Document details: Country/Office: EP 

Number: 03293216.2 

Filing date: 18 December 2003 (18.12.2003) 



Date of receipt at the International Bureau: 01 April 2005 (01.04.2005) 

Remark: Priority document submitted or transmitted to the International Bureau in 

compliance with Rule 17.1(a) or (b) 




World Intellectual Property Organization (WIPO) - Geneva, Switzerland 
Organisation Mondiale de la Propriete Intellectuelle (OMPI) - Geneve, Suisse 




Europaisches 
Patentamt 



PCT/EP200 4 0 0 5 3 4 7 1 



European 
Patent Office 



Office europeen 
des brevets 



Bescheinigung Certificate 



Attestation 



Die angehefteten Unterla- 
gen stimmen mit der 
ursprQnglich eingereichten 
Fassung der auf dem nach- 
sten Blatt bezeichneten 
europaischen Patentanmel- 
dung uberein. 



The attached documents 
are exact copies of the 
European patent application 
described on the following 
page, as originally filed. 



Les documents fixes a 
cette attestation sont 
conformes a la version 
initialement deposee de 
la demande de brevet 
europeen specif iee a la 
page suivante. 



Patentanmeldung Nr. Patent application No. Demande de brevet n° 



03293216.2 



Der President des Europaischen Patentamts; 
Im Auftrag 

For the President of the European Patent Office 

Le President de I'Office europeen des brevets 
p.o. 




R C van Dijk 



EPA/EPO/OEB Form 1014.1 - 02.2000 7001014 



PCT/EP20Q h 0 0 5 3 4 7 1" 

Europaisches European Office europeen 

Patentamt Patent Office des brevets 



Anmeldung Nr: 

Application no.: 03293216.2 



Demande no: 



Anmeldetag: 
Date of filing: 
Date de d£pdt: 



18. 12. 03 



Anmelder/Appl 1cant( s)/Demandeur( s) : 

Thomson Licensing S.A. 
46, quai A.Le Gallo 
92100 Boulogne-Billancourt 
FRANCE 

Bezelchnung der Erf 1 ndung/T1 tl e of the 1 nvent1on/T1 tre de I 1 Invention: 
(Falls die Bezelchnung der Erflndung nlcht angegeben 1st, slehe Beschrel bung. 
If no title Is shown please refer to the description. 
S1 aucun tltre n'est 1nd1qu£ se ref erer a la description.) 

Device and method for creating a saliency map of an image 

In Anspruch genommene Pr1or1at(en) / Priori ty( 1es) claimed /Priori te"(s) 
revend1qu6e( s) 

Staat/Tag/Aktenze1chen/State/Date/Flle no. /Pays/Da te/Num<§ro de de"pot: 



Internationale Patentkl asslf 1 katlon/Internatlonal Patent Classification/ 
Classification Internationale des brevets: 



Am Anmeldetag benannte Vertragstaaten/Contractl ng states designated at date of 
flltng/Etats contractants d£s1gn£es lors du d6pot: 



AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MG NL 
PT RO SE SI SK TR LI 



G06T/ 



03293216.2 
EPA/EP0/0EB Form 1014.2 - 01.2000 



7001014 



2 



Fax re9" <*e : +33 9 2 99 27 35 88 



18/'12/'83 16:18 P g : 8 



1 

The invention is related to a device and a method for creating a 
saliency map of an image. 

5 The human information processing system is intrinsically a limited 

system and especially for the visual system. In spite of the limits of our 
cognitive resources, this system has to face up to a huge amount of 
information contained in our visual environment. Nevertheless and 
paradoxically, humans seem to succeed in solving this problem since we are 

10 able to understand our visual environment 

It is commonly assumed that certain visual features are so elementary to 
the visual system that they require no attentional resources to be perceived. 
These visual features are called pre-attentive features. 

According to this tenet of vision research, human attentive behavior is 

15 shared between pre-attentive and attentive processing. As explained before, 
pre-attentive processing, so-called bottom-up processing, is linked to 
involuntary attention. Our attention is effortless drawn to satient parts of our 
view. When considering attentive processing, so-called top-down processing, 
it is proved that our attention is linked to a particular task that we have in 

20 mind. This second form of attention is thus a more deliberate and powerful 
one in the way that this form of attention requires effort to direct our gaze 
towards a particular direction. 

The detection of the salient points in an image enable the improvement 
25 of further steps such as coding or image indexing, watermarking, video quality 
estimation. 

The known approaches are more or less based on non-psycho visual 
features. In opposition with such methods, the proposed method relies on the 
30 fact that the model is fully based on the human visual system (HVS) such as 
the computation of early visual features. 

In a first aspect, the invention proposes a method for creating a 
saliency map of an image comprising the steps of : 
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- projection of said image according to the luminance component 
and if said image is a color image, according to the luminance 
component and according to the chrominance components, 

- perceptual sub-bands decomposition of said components 
according to the visibility threshold of a human eye, 

- extraction of the salient elements of the sub-bands related to the 
luminance component, 

- contour enhancement of said salient elements in each sub-band 
related to the luminance component, 

- calculation of a saliency map from the contour enhancement, for 
each sub-band related to the luminance component 

- creation of the saliency map as a function of the saliency maps 
obtained for each sub-band. 



15 In a second aspect, the invention proposes a device for creating a 

saliency map of an image characterized in that it comprises means for: 

- Projecting said image according to the luminance component 
and if said image is a color image, according to the luminance 
component and according to the chrominance components, 

20 - Transposing into the frequential domains said luminance and 

chrominance signals, 

- Decomposing into perceptual sub-bands said components of the 
frequentia! domain according to the visibility threshold of a human 



25 



30 



eye, 

- Extracting the salient elements of the sub-bands related to the 
luminance component, 

- Contour enhancing said salient elements in each sub-band 
related to the luminance component, 

- Calculating a saliency map from the contour enhancement, for 
each sub-band related to the luminance component. 

- Creating the saliency map as a function of the saliency maps 
obtained for each sub-band. 
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Other characteristics and advantages of the invention will appear 
through the description of a non-limiting embodiment of the invention, which 
will be illustrated, with the help of the enclosed drawings, wherein: 

5 - Figure 1 represents a general flow-chart of a preferred 

embodiment of the method according to the invention applied 
to a black and white image, 

Figure 2 represents a general flow-chart of a preferred 
embodiment of the method according to the invention applied 
10 to a black and white image, 

Figure 3 represents the psycho visual spatia! frequency 
partitioning for the achromatic component, 

Figure 4 represents the psycho visual spatial frequency 
partitioning for the chromatic components, 
15 - Figures 5 represents the Dally Contrast Sensitivity Function, 

Figure 6a and 6b represent respectively the visual masking and 
a non linear model of masking, 

Figure 7 represents the flow-chart of the normalisation step 
according to the preferred embodiment, 
20 - Figure 8 represents the inhibition/excitation step, 

Figure 9 represents the profile of the filters to model facilitative 
interactions for 0=0, 

Figure 10 represents an illustration of the operator D{z), 
Figure 1 1 represents the chromatic reinforcement step, 
Figure 12 represents the non CRF exhibition caused by the 
adjacent areas of the CRF flanks, 

Figure 13 represents a profile example of the normalized 
weighting function for a particular orientation and radial 
frequency. 



25 



30 



Figure 1 represents the general flow-chart of a preferred embodiment 
of the method according to the invention applied to a black and white image. 
The algorithm is divided in three main parts. 
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The first one named visibility is based on the fact that the human visual 
system (HVS) has a limited sensitivity. For example, the HVS is not able to 
perceive with a good precision all signals in your real environment and is 
insensible to small stimuli. The goal of this first step has to take into account 
5 these intrinsic limitations by using perceptual decomposition, contrast 
sensitivity functions (CSF) and masking functions. 

The second part is dedicated to the perception concept. The perception 
is a process that produces from images of the external world a description that 
is useful to the viewer and not cluttered with irrelevant information. To select 
10 relevant information » a center surround mechanism is notably used in 
accordance with biological evidences. 

The last step concerns some aspects of the perceptual grouping 
domain. The perceptual grouping refers to the human visual ability to extract 
significant images relations from lower level primitive image features without 
15 any knowledge of the image content and group them to obtain meaningful 
higher-level structure. The proposed method just focuses on contour 
integration and edge linking. 

Steps E3, E4 are executed on the signal in the frequentfal domain. 
Steps E1, E6 and E9 are done in the spatial domain. 
20 Steps E7 and E8 are done in the frequential or spatial domain. If they 

are done in the frequential domain, a Fourier transformation has to be carried 
on before step E7 and an inverse Fourier transformation has to be carried out 
before step E9. 

In step El, the luminance component is extracted from the considered 

25 image. 

In step E2, the luminance component is transposed into the frequency 
domain by using known transformations such as the Fourier transformation in 
order to be able to apply in step E3, the perceptual sub-band decomposition 
on the image. 

30 In step E3, a perceptual decomposition is applied on the luminance 

component. This decomposition is inspired from the cortex transform and 
based on the decomposition proposed in the document "The computation of 
visual bandwidths and their impact in image decomposition and coding", 
International Conference and Signal Processing Applications and Technology, 



Fax regu &e I +33 & 2 99 27 35 80 



18/12/03 16: 18 Pg: 12 



5 

Santa-Clara, California, pp. 776-770, 1993. This decomposition is done 
according to the visibility threshold of a human eye. 

The decomposition, based on different psychophysics experiments, is 
obtained by carving up the frequency domain both in spatial radial frequency 
5 and orientation. The perceptual decomposition of the component A leads to 
17 psycho visual sub-bands distributed on 4 crowns as shown on figure 3. 

The shaded region on the figure 3 indicates the spectral support of the 
sub-band belonging to the third crown and having an angular selectivity of 30 
degrees, from 15 to 45 degrees. 
10 Four domains (crowns) of spatial frequency are labeled from I to IV: 

I: spatial frequencies from 0 to 1.5 cycles per degree; 

II: spatial frequencies from 1 .5 to 5.7 cycles per degree; 

III: spatial frequencies from 5.7 to 14.2 cycles per degree; 

IV: spatial frequencies from 14.2 to 28,2 cycles per degree. 

15 

The angular selectivity depends on the considered frequency domain. 
For low frequencies, there is no angular selectivity. 

The main properties of these decompositions and the main differences 
20 from the cortex transform are a non-dyadic radial selectivity and an orientation 
selectivity that increases with the radial frequency. 

Each resulting sub-band may be regarded as the neural image 
corresponding to a population of visual cells tuned to a range of spatial 
frequency and a particular orientation. In fact, those cells belong to the 
25 primary visual cortex (also called striate cortex or V1 for visual area 1). It 
consists of about 200 million neurons in total and receives its input from the 
lateral geniculate nucleus. About 80 percent of the cells are selective for 
orientation and spatial frequency of the visual stimulus. 

30 ° n the image spatial spectrum, a well-known property of the HVS is 

applied, which is known as the contrast sensitivity function (CSF). The CSF 
applied is a multivariate function mainly depending on the spatial frequency, 
the orientation and the viewing distance. 
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Biological evidences have shown that visual cells response to stimuli 
above a certain contrast. The contrast value for which a visual cell response is 
called the visibility threshold (above this threshold, the stimuli is visible). This 
threshold varies with numerous parameters such as the spatial frequency of 
5 the stimuli, the orientation of the stimuli, the viewing distance, This 
variability leads us to the concept of the CSF which expresses the sensitivity 
of the human eyes (the sensitivity is equal to the inverse of the contrast 
threshold) as a multivariate function. Consequently, the CSF permits to 
assess the sensitivity of the human eyes for a given stimuli. 
10 In step E4, a 2D anisotropic CSF designed by Dally is applied. Such a 

CSF is described in document "the visible different predictor: an algorithm for 
the assessment of image fidelity", in proceedings of SPIE Human vision.visual 
processing and digital display III, volume 1666, pages 2-15, 1992, 

15 The CSF enables the modelisation of an important property of the 

eyes, as the SVH cells are very sensitive to the spatial frequencies. 
On figure 5, the Dally CSF is illustrated. 

Once the Dally function has been applied, an inverse Fourrier 
20 Transformation is applied on the signal in step E5, in order to be able to apply 
the next step E6. 

For natural pictures, the sensitivity can be modulated (increased or 
decreased the visibility threshold) by the presence of another stimulus. This 
25 modulation of the sensitivity of the human eyes is called the visual masking, 
as done in step E6. 

An illustration of masking effect is shown on the figures 6a and 6b. Two 
cues are considered, a target and a masker where Ct and Cm are the contrast 
threshold of the target in the presence of the masker and the contrast of the 
30 masker respectively. Moreover, C T o is the contrast threshold measured by a 
CSF (without masking effect). 



When Cm varies, three regions can be defined : 
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• At low values of C M , the detection threshold remains constant. The 
visibility of the target is not modified by the masker. 

• When Cm tends toward C T o, the masker eases the detection of the 
target by decreasing the visibility threshold. This phenomenon is 
called facilitative or pedestal effect. 

• When C M increases, the target is masked by the masker. His 
contrast threshold increases. 

The visual masking method is based on the detection of a simple signal 
as sinusoidal patterns. 

There are several other methods to achieve the visual masking 
modeling based on psychophysics experiments: for instance, a best method 
refers to the detection of quantization noise. 

It is obvious that the preferred method is a strong simplification as 
regard the intrinsic complexity of natural pictures. Nevertheless, numerous 
applications (watermarking, video quality assessment) are built around such 
principle with interesting results compared to the complexity. 

In the context of sub-band decomposition, masking has been 
intensively studied leading to define three kinds of masking: intra-channel 
masking, inter-channe! masking and inter-component masking. 

The intra-channel masking occurs between signals having the same 
features (frequency and orientation) and consequently belonging to the same 
channel. It is the most important masking effect. 

The inter-channel masking occurs between signals belonging to 
different channels of the same component. 

The inter-component masking occurs between channels of different 
components (the component A and one chromatic component for example). 
These two last visual masking are put together and are just called inter- 
masking in the following. 

For the achromatic component, we used the masking function designed 
by Dally in document entitled "A visual model for Optimizing the Design of 
Image Processing Algorithms", in IEEE international conferences on image 
processing, pages 16-20, 1994, in spite of the fact that this model doesn't take 
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into account the pedestal effect. The strength of this model lies in the fact that 
it has been optimized with a huge amount of experimental results. 
The variation of the visibility threshold is given by : 

T/^(m,u) - a + (A I (^ 2 |i? 1 , > ('«,")|) v )*) ,/ * 

where R. is a psycho visual channel stemming from the perceptual channel 
decomposition (For example, the shaded region on the figure 2.1 leads to the 
channel R„ K2 ). The values ki, k 2 , s, b are given below : 



k1=0.0153 
k2= 392.5 



The below table gives the values of s and b according to the 
15 considered sub-band: 



Sub-band 


s 


b 


1 


0.75 


4 


II 


1 


4 


III 


0.85 


4 


IV 


0.85 


4 



We get the signal R u fx,/) at the output of the masking step. 

20 R' t j (-*, y) = R,,j (x, y) I T tJ (x, y) 

Then, in step E7, the step of normalization enables to extract the main 
important information from the sub-band. Step E7 is detailed on figure 7. 

In step S1, a first sub-band R'i.j(x,y) is selected. The steps S2 to S4 and 
S8 are carried on for each sub-band R'ij(x.y) of the 17 sub-bands. 
25 The steps S5 to S7 are done for the second crown (II). 

I represents the spatial radial frequency band, I belongs to {I, II, III, IV}. 
J represents the orientation, j belongs to {1, 2, 3, 4, 5, 6}, 
(x.y) represent the spatial coordinates. 
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In other embodiments, the different steps can be carried out on ail the 
sub-bands. 

Steps S2 and S3 aim to modelcze the behavior of the classical 
receptive field (CRF). 

5 The concept of CRF permits to establish a link between a retinal image 

and the global percept of the scene. The CRF is defined as a particular region 
of visual field within which an appropriate stimulation (with preferred 
orientation and frequency) provokes a relevant response stemming from 
visual celt. Consequently, by definition, a stimulus in the outer region (called 
10 surround) cannot activate the cell directly. 

The inhibition and excitation in steps S2 and S3 are obtained by a 
Gabor filter, which is sensible as for the orientation and the frequency. 
The Gabor filter can be represented as following: 

f being the spatial frequency of the cosinus modulation in cycles per 
degree (cy/°). 

(x 9 , y e ) are obtained by a translation of the original coordinates 
20 C-v 0 ,^ 0 ) and by a rotation of & , 



x. 



9 



I- 


cos& s'm&~ 


~x-x a ~ 




— sin 0 cos 6 


,y-y<>_ 



25 



r 


( \ 

X 


2 


( \ 

y 












1 



A representing the amplitude, 

<r x et cr r representing the width of the gaussian envelop along the x 

and y axis respectively. 
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excitation(x, y, cr x , a y = 



f gator (x^o^o-,,/.*) if -l/(4/)Sx, < 1/(4/) 
0 otherwise 



In order to obtain elliptic shapes, we take different variances cr x < a y . 



Finally, we get the output: 



Rfj (x, )0 = #', v (x, y) * excitationix, y, cr x ,tr y ,f t 0) 



10 



In step S3, the inhibition is calculated by the following formula : 



fO si - 1/(4/) < x 9 < 1/(4/) 

, /, &\ si 



inhibition^ y, cx x , cr y , /, <5») ~ ]^ gabor ^ ^ 



smon. 



And finally: 



15 



Rf"f(.x,y) = R'.j (x,y)*inhibition{x,y,cr x ,cr y ,f,e) 



20 



In step S4, the difference between the excitation and the inhibition is 
done. The positive components are kept, the negative components are set to 
"0". This is the following operation, 



R' u (x, y) = R!f (x, y) - R% (*, y)\ 



>0 



25 



In step S5, for each orientation, for each sub-band of the second 
domain, two convolution products are calculated: 

Llj (x, y) = R'ij (x,y) * Bfj (x, y) 



L\j (x, y) = R" u (x, y) * B)j (x, y) 



'J U J J i.t — KJO 
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^vC*.^) ar » d Kjfcy) are 2 half-butterfly filters. The profile of these 
filters allow the modelling of facilitative interactions for 9=0 given on figure 9. 
These filters are defined by using a bipole/butterfly filter. 

It consists of a directional term D 0 ( x , y ) and a proximity term 
generated by a circle c r blurred by a gaussian filter G (x, y) . 



B e ^ r . a (x,y) = D e> j ( X ,y) . C, * G^ fx, j/) 

With D 0lj (. X .y) = j^(-^^si^<« 

(Osinon. 

and «? = arctanC^,) . where (A-\y) 7 ' is the vector (*,.>o 7 ' rotated by e. . The 
parameter a defines the opening angle 2a of the bipole filter. It depends on 
the angular selectivity Y of the considered sub-band. We take a = o.4xy . The 
size of the bipole filter is about twice the size of the CRF of a visual cell. 

In step S6, we compute the facilitative coefficient: 



max(/?, \Ll ( X , y) - $ . r>, y) \) 



with, 

fi a constant. 



or, , z < .y, , 



where £1,/ e [0.../V-1] 



« A ._, .. z < j v _, 



An illustration of the operator D(z) is given on figure 9. 

To ease the application of the facilitative coefficient, the operator Z><» 

ensures that the facilitative coefficient is constant by piece as shown on figure 
9. 
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In step S7, the facilitate coefficient is applied to the normalized result 
obtained in step S4. 

(JC> 30 « R" tJ (x, y) x (1 + (x, y)) 

Going back to step E8 of figure 1, after step S7 of figure 7, the four 
saliency maps obtained for the domain II are combined to get the whole 
saliency map according to the following formula: 

fixation^y-) = a * Rj lfl <.x,y) + ^ + ^ C*. JO + 8 x i?;; >3 (x,y) 

a, P, jc, 5 represent weighting coefficients which depend on the 
application (watermarking, coding...). 

In other embodiments, the saliency map can be obtained by a 
calculation using the whole 17 sub-bands and not only the sub-bands of 
domain II. 

20 Figure 2 represents the general flow-chart of a preferred embodiment 

of the method according to the invention applied to a colour image. 

Steps T1 , T4, T'4, T"4, T5 and T8 are done in the spatial domain. 

Steps T2, T'2, T'2, T3, T'3, T"3 are done in the frequencial domain. 

A Fourier transformation is applied on three components between step 

25 T1 and steps T2, T'2. T'2. 

An inverse Fourier transformation is applied between respectively T3, 

T'3, T"3 and T4, T'4 and T"4. 

Steps T6 and T7 can be done in the Sequential or spatial domain. If 
30 they are done in the frequential domain, a Fourier transformation is done on 
the signal between steps T5 and T6 and an inverse Fourier transformation is 
done between steps T7 and T8. 



*ax re?u ae . T ao o £. ^ ^.r od od 



1 



"1 



Step T1 consists in converting the RGB luminances into the 
Krauskopfs opponent-colors space composed by the cardinal directions A, 
Cr1 and Cr2. 

This transformation to the opponent-colors space is a way to 
decorrelate color information. In fact, it's believed that the brain uses 3 
different pathways to encode information: the first conveys the luminance 
signal (A), the second the red and green components (Cr1) and the third the 
blue and yellow components (Cr2). 



10 
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These cardinal directions are in closely correspondence with 
signals stemming from the three types of cones (L.M.S). 

Each of the three components RGB firstly undergoes a power-law 
non linearity (called gamma law) of the form x'with y^1A. This step is 
necessary in order to take into account the transfer function of the display 
system. The CIE (French acronym for "commission internationale de 
I'eclairage") XYZ tristimulus value which form the basis for the conversion to 
an HVS color space is then computed by the following equation: 



20 



Y 



f 0.412 
0.213 
0.019 



0.358 0.18^1 
0.715 0.072 
0.119 0.95 



G 



The response of the (L.M.S) cones are computed as follows : 



M 



f 0.240 0.854 
-0.389 1.160 



-0.001 0.002 



- 0.0448YX^ 
0.085 
0.573 



Y 
Z 



25 



From the LMS space, one has to obtain an opponent color space. 
There is a variety of opponent color space, which differ in the way to combine 
the different cones responses. From experimental experiments, the color 
space designed by Krauskopf have been validated and it is given by the 
following transformation : 
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Crl 
yCr2 



1 
1 

0.5 



I 

-1 
-0.5 



0\ 
0 
1 



M 



Then, in step T2, a perceptual decomposition is applied to the 
luminance component. Preliminary to step T2 and further to step T1, the 
5 luminance component is transposed into the frequency domain by using 
known transformations such as the Fourier transformation in order to be able 
to apply in step T2, the perceptual sub-band decomposition on the image. 

The perceptual sub-band decomposition of step T2 is the same as the 
step E3 of figure 1 , and thus will not be described here, as described earlier. 
10 Concerning the decomposition of chromatic components Cr2 and Cr1, 

of steps T'2 and T"2 as shown on figure 4, the decomposition leads to 5 
psychovisual sub-bands for each of these components distributed on 2 
crowns. Preliminary to steps T'2 P T"2 and further to step T1, the chrominance 
components are transposed into the frequency domain by using known 
15 transformations such as the Fourier transformation in order to be able to apply 
in step T'2 and T"2, the perceptual sub-band decomposition on the image. 
Two domains of spatial frequency are labeled from I to II: 
hspatial frequencies from 0 to 1.5 cycles per degree, 
ll:spatial frequencies from 1.5 to 5.7 cycles per degree. 
20 In steps T3, T'3 and T"3, a contract sensitivity function (CSF) is applied. 

In step T3, the same contrast sensitivity as in step E4 of figure 1 is 
performed on the luminance component and thus will not be described here. 

In step T'3 and T"3, the same CSF is applied on the two chromatic 
components Cr1 and Cr2. On the two chromatic components, a two- 
25 dimensional anisotropic CSF designed by Le Callet is applied. It is described 
in document « criteres objectifs avec references de qualite visuelle des 
images couleurs » of Mr Le Callet, university of Nantes, 2001 . 

This CSF uses two low-pass filters with a cut-off frequency of about 5.5 
cyclse per degree and 4.1 cycles per degree respectively for Cr1 and Cr2 
30 components. 

In order to permit the direct comparison between early visual features 
stemming from different visual modalities (achromatic and chromatic 
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components), the sub-bands related to the visibility are weighted. The visibility 
threshold being defined as the stimulus 's contrast at a particular point for 
which the stimulus just becomes visible. 

An inverse Fourier transformation is then applied on the different 
5 components (not shown on figure 2) in order to be able to apply the masking 
in the spatial domain. 

Then, an intra masking is applied on the different sub-bands for the 
chromatic components Cr1 and Cr2 during step T'4 and T"4 and for the 
achromatic component in step T4. This last step has already been explained 
10 in the description of figure 1, step E6. Thus, it will not be described again 
here. 

Intra channel masking is incorporated as a weighing of the outputs of 
the CSF function. Masking is a very important phenomenon in perception as it 
15 describes interactions between stimuli. In fact, the visibility threshold of a 
stimulus can be affected by the presence of another stimulus. 

Masking is strongest between stimuli located in the same perceptual 
channel or in the same sub-band. We apply the intra masking function 
designed by Dally on the achromatic component as described on figure 1 , 
20 step E6 and, on the color component, the intra masking function described in 
document of P. Le Callet and D. Barba, "Frequency and spatial pooling of 
visual differences for still image quality assessment", in Proc. SPIE Human 
Vision and Electronic Imaging Conference, San Jose, CA, Vol. 3959, January 



25 



30 



2000. 



These masking functions consist of non linear transducer as expressed 
in document of Legge and Foley, "Contrast Masking in Human Vision", 
Journal of the Optical Society of America, Vol. 70, , N° 12, pp. 1458-1471, 
December 1980. 

Visual masking is strongest between stimuli located in the same 
perceptual channel (intra-channel masking). Nevertheless, as shown in 
numerous studies, there are several interactions called inter-component 
masking providing a masking or a pedestal effect. From psychophysics 
experiments, significant inter-components interactions involving the chromatic 
components have been elected. Consequently, the sensitivity of achromatic 
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component could be increased or decreased by the Cr1 component. The 
influence of the Cr2 on the achromatic component is considered insignificant. 
Finally, the Cr1 can also modulate the sensitivity of Cr2 component (and vice 
versa). 

5 

Then in step T5, a chromatic reinforcement is done. 

The colour is one of the strongest attractor of the attention and the 
invention wants to take advantage of this attraction strength by putting forward 
10 the following property: the existence of regions showing a sharp colour and 
fully surrounded of areas having quite other colours implies a particular 
attraction to the borders of this region. 

To avoid the difficult issue of aggregating measures stemming from 
achromatic and chromatic components, the colour facilitation consists in 
15 enhancing the saliency of achromatic structure by using a facilitative 
coefficient computed on the low frequencies of the chromatic components. 

In the preferred embodiment, only a sub-set of the set of 
achromatic channels is reinforced. This subset contains 4 channels having an 

20 angular selectivity equal to *-/4and a spatial radial frequency (expressed in 
cyc/deg) belonging to [l .5,5.7]. One notes these channels R KJ where i 
represents the spatial radial frequency and j pertains to the orientation. In 
our example, j is equal to { 0 , tt /4 , ?r /2 , 3 ,t / 4 } . In order to compute 
a facilitative coefficient, one determines for each pixel of the low frequency of 

25 Cr1 and Cr2 the contrast value related to the content of adjacent areas and to 
the current orientation of the reinforced achromatic channel as illustrated on 
the figure 1 1 . On figure 11, the contrast value is obtained by computing the 
absolute difference between the average value of the set A and the average 
value of the set B. The sets A and B belong to the low frequency of Cr1 or Cr2 

30 and are oriented in the preferred orientation of the considered achromatic 
channel. 
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The chromatic reinforcement is achieved by the following equation, on 
an achromatic (luminance) channel Rij(x,y). 



*8 y) = &*j y) * a + M - 4,, + \ A - 4>2 ) 



where, 

R$(x,y) represents the reinforced achromatic sub-band, 
R i i( x >y) represents an achromatic sub-band. 

A~B\ k represents the contrast value computed around the current 
10 point on the chromatic component k in the preferred orientation of the sub- 
band Jfc, ^ (*,>>). shown on figure 7. In the embodiment, the sets A and B 
belong to the sub-band of the first crown (low frequency sub-band) of the 
chromatic component k with an orientation equal to ^/ . 

1 5 In other embodiments, all the sub-bands can be considered. 

In step T6, a center/surround suppressive interaction is carried on. 
This operation consists first in a step of inhibition/excitation. 

20 A two-dimensional difference-of-Gaussians (DoG) is used to model the 

non-CRF inhibition behavior of cells. The DoG „ ^ „„, (x,y) is given by the 

following equation : 



with G (x, y) = 1 exp(--^- - -^ T ) a two-dimensional 
25 gaussian. 

Parameters and (of*, erf) correspond to spatial extends of 

the Gaussian envelope along the x and y axis of the central Gaussian (the 
CRF center) and of the inhibitory Gaussian (the surround) respectively. These 
parameters have been experimentally determined in accordance with the 
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radial frequency of the second crown (the radial frequency / e[l.5,5.7] is 
expressed in cycles/degree). Finally, the non-ciassical surround inhibition can 
be modeled by the normalized weighting function M' , „ ^ y ) given by 

°> ►°r 



the following equation: 

^ ( X > JO ~ 



cr x ,cr >t .<T r ,o\ 



H(DoG <x Inh ) 



H(DoG, t _ , nk - {x % , y )) 



with, 

(x\y) is obtained by translating the original coordinate system by o 0 ,y 0 ) 
and rotating it by o u expressed in radian, 



IV] ^ F cos 0 U sin 0 U Tx- x 0 1 



\[ denotes the L y norm, Le the absolute value. 



The figure 12 shows the structure of non-CRF inhibition. 

The figure 13 shows a profile example of the normalized weighting 



15 function w , r 



The response R$(x,y) of cortical cells to a particular sub-band 
R${x 9 y)i& computed by the convolution of the sub-band R$(x 9 y) with the 
weighting function w „ 

(*» y) : 

with defined as has been described above. 



In step T7, a facilitative interaction is carried on. 

This facilitative interaction is usually termed contour enhancement or 
25 contour integration. 

Facilitative interactions appear outside the CRF along the preferred 
orientation axis. These kinds of interactions are maximal when center and 
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surround stimuli are iso-oriented and co-aligned. In other words, as shown by 
several physiologically observations, the activity of cell is enhanced when the 
stimuli within the CRF and a stimuli within the surround are linked to form a 
contour. 

5 Contour integration in early visual preprocessing is simulated using 

two half butterfly filter B, 0 ,and b)j. The profiles of these filters are shown on 

the 9 and they are defined by using a bipole/butterfly filter. It consists of a 
directional term D 0 (x,y) and a proximity term generated by a circle c r blurred 
by a gaussian filter G„ (x,y) . 



10 



B 0^A^y> - d b.., (*00 • C r * G a {x,y) 



with (*,>»)-< 



nil 

cos(— — <o) si tp < a 
a 

Osinon. 



and <p = arctan(%) , where (.v\y) r is the vector c*o0 r rotated by <9, y . The 

parameter a defines the opening angle 2a of the bipole filter. It depends on 
15 the angular selectivity y of the considered sub-band. One takes a. • o.4xy . The 
size of the bipole filter is about twice the size of the CRF of a visual cell. 

The two half butterfly filter 2?°,. and £?,are after deduced from the 
butterfly filter by using appropriate windows. 

20 For each orientation, sub-band and location, one computes the 

facilitative coefficient : 

/Zl*>y) = DC ^M>y^KA*,y) 

max(/?, \l} tJ (x, y) - L° (J (x, 

with, 

25 p a constant, 

LVj{x,y) = R%\x,y) * Bj^x.y) , 
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D(z) = \ 



0,z<s,, 



where 



a N _ xy z<s N _ x 



An illustration of the operator D(z) is given on figure 9. 



The sub-band tf< 3 ? resulting from the facilitative interaction is finally 
obtained by weighting the sub-band R™ by a factor depending on the ratio of 
the local maximum of the facilitative coefficient f™{x,y) and the global 
maximum of the facilitative coefficient computed on all sub-bands belonging to 
the same range of spatial frequency : 

max(/,7 (*,>>)) 

From a standard butterfly shape, this facilitative factor permits to 
improve the saliency of isolated straight lines. rj lx " permits to control the 
strength of this facilitative interaction. 

4 

In step E8, the saliency map is obtained by summing all the resulting 
sub-bands obtained in step E7. 



20 In other embodiments of the invention, one can use all the sub-bands 

and not only the sub-bands of the second crown. 



Although cortical cells tuned to horizontal and vertical orientations are 
almost as numerous as cells tuned to other orientations, we don't introduce 
25 any weighting. This feature of the HVS is implicitly mimic by the application of 
2D anisotropic CSF. 
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Claims 



10 



15 



20 



25 



1. Method for creating a saliency map of an image characterized in that it 
comprises the steps of : 

- Projection (E1) of said image according to the luminance (A) 
component and if said image is a color image, according to the 
luminance (A) component and according to the chrominance 
components (Cr1 , Cr2), 

- Perceptual sub-bands decomposition (E3, T2, T'2, T"2) of said 
components (A, Cr1, Cr2) according to the visibility threshold of 
a human eye, 

- Extraction (E7) of the salient elements of the sub-bands related 
to the luminance (A) component, 

- Contour enhancement (E8, T7) of said salient elements in each 
sub-band related to the luminance (A) component, 

- Calculation (T7) of a saliency map from the contour 
enhancement, for each sub-band related to the luminance (A) 
component. 

- Creation (T8) of the saliency map as a function of the saliency 
maps obtained for each sub-band. 

2. Method according to claim 1 characterized in that it comprises, further 
to the perceptual sub-bands decomposition, 

- a step of achromatic contrast sensitivity function (CSF) for the 
luminance (A) component and if said image is a color image, 

- a step of chromatic sensitivity function for the chromatic 
components (CM , Cr2). 



30 3. 



Method according to claim 2 characterized in that it comprises a step 
(E6, T4, T'4, T»4) of visual masking, further to the step of contrast sensitivity 
function, for each sub-band of the luminance (A) component and of the 
chromatic (Cr1 , Cr2) components. 
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4. Method according to claim 3 characterized in that, when said image is 
a color image, it comprises a step (T5) of chromatic reinforcement of the 
luminance (A) sub-bands. 

5 5. Method according to any of the preceding claims characterized in that 
the perceptual sub-bands decomposition is obtained by carving-up the 
frequency domain both in spatial radial frequency and orientation. 

6. Method according to claim 5 characterized in that the perceptual 
10 decomposition of the luminance (A) component leads to 17 psycho visual sub- 
bands distributed on four crowns. 

7. Method according to claim 5 or 6 characterized in that the perceptual 
decomposition of the chromatic components (Cr1, Cr2) leads to 5 psycho 

15 visual sub-bands distributed on two crowns for each chromatic component 
(Cr1 , Cr2). 

8. Method according to claims 4 to 7 characterized in that the chromatic 
reinforcement of the luminance (A) component is done on the sub-bands of 

20 the second crown and based on the sub-bands of the first crown of the 
chromatic components (Cr1 , Cr2). 

9. Device for creating a saliency map of an image characterized in that it 

comprises means for: 
25 - Projecting said image according to the luminance (A) component 

and if said image is a color image, according to the luminance 
(A) component and according to the chrominance components 
(CM, Cr2), 

- Transposing into the frequential domains said luminance and 
30 chrominance signals, 

- Decomposing into perceptual sub-bands said components of the 
frequential domain according to the visibility threshold of a 
human eye, 
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Extracting the salient elements of the sub-bands related to the 
luminance component, 

Contour enhancing said salient elements in each sub-band 
related to the luminance component, 

Calculating a saliency map from the contour enhancement, for 
each sub-band related to the luminance component. 
Creating the saliency map as a function of the saliency maps 
obtained for each sub-band. 



10 
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Device and method for creating a saliency map of an image. 



Abstract 



5 The invention concerns a device and a method for creating a saliency 

map of an image. It comprises the steps of : 

- Projection (E1) of said image according to the luminance (A) 
component and if said image is a color image, according to the 
luminance (A) component and according to the chrominance 

!0 components (Cr1 , Cr2), 

- Perceptual sub-bands decomposition (E3, T2, T2, T"2) of said 
components (A, CM, Cr2) according to the visibility threshold of 
a human eye, 

- Extraction (E7) of the salient elements of the sub-bands related 
15 to the luminance (A) component, 

- Contour enhancement (E8, T7) of said salient elements in each 
sub-band related to the luminance (A) component, 

- Calculation (T7) of a saliency map from the contour 
enhancement, for each sub-band related to the luminance (A) 

20 component. 

- Creation (TS) of the saliency map as a function of the saliency 
maps obtained for each sub-band. 



Fig 2 
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